Gene Selection for Multi-Class Prediction of Microarray Data

نویسندگان

  • Dechang Chen
  • Dong Hua
  • Jaques Reifman
  • Xiuzhen Cheng
چکیده

Gene expression data from microarrays have been successfully applied to class prediction, where the purpose is to classify and predict the diagnostic category of a sample by its gene expression profile. A typical microarray dataset consists of expression levels for a large number of genes on a relatively small number of samples. As a consequence, one basic and important question associated with class prediction is: how do we identify a small subset of informative genes contributing the most to the classification task? Many methods have been proposed but most focus on two-class problems, such as discrimination between normal and disease samples. This paper addresses selecting informative genes for multi-class prediction problems by jointly considering all the classes simultaneously. Our approach is based on the power of the genes in discriminating among the different classes (e.g., tumor types) and the existing correlation between genes. We formulate the expression levels of a given gene by a one-way analysis of variance model with heterogeneity of variances, and determine the discriminatory power of the gene by a test statistic designed to test the equality of the class means. In other words, the discriminatory power of a gene is associated with a Behrens-Fisher problem. Informative genes are chosen such that each selected gene has a high discriminatory power and the correlation between any pair of selected genes is low. Test statistics considered in this paper include the ANOVA test statistic, the Brown-Forsythe test statistic, the Cochran test statistic, and the Welch test statistic. Their performances are evaluated over several classification methods applied to two publicly available microarray datasets. The results show that Brown-Forsythe test statistic achieves the best performance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of blood cancer using leukemia gene expression data and sparsity-based gene selection methods

Background: DNA microarray is a useful technology that simultaneously assesses the expression of thousands of genes. It can be utilized for the detection of cancer types and cancer biomarkers. This study aimed to predict blood cancer using leukemia gene expression data and a robust ℓ2,p-norm sparsity-based gene selection method. Materials and Methods: In this descriptive study, the microarray ...

متن کامل

Diagnosis of Breast Cancer Subtypes using the Selection of Effective Genes from Microarray Data

Introduction: Early diagnosis of breast cancer and the identification of effective genes are important issues in the treatment and survival of the patients. Gene expression data obtained using DNA microarray in combination with machine learning algorithms can provide new and intelligent methods for diagnosis of breast cancer. Methods: Data on the expression of 9216 genes from 84 patients across...

متن کامل

Gene Identification from Microarray Data for Diagnosis of Acute Myeloid and Lymphoblastic Leukemia Using a Sparse Gene Selection Method

Background: Microarray experiments can simultaneously determine the expression of thousands of genes. Identification of potential genes from microarray data for diagnosis of cancer is important. This study aimed to identify genes for the diagnosis of acute myeloid and lymphoblastic leukemia using a sparse feature selection method. Materials and Methods: In this descriptive study, the expressio...

متن کامل

Improving MSVM-RFE for Multiclass Gene Selection∗

Along with the advent of DNA microarray technology, gene expression profiling has been widely used to study molecular signatures of many diseases and to develop molecular diagnostics for disease prediction. In class prediction problems using expression data, gene selection is essential to improve the prediction accuracy and to identify informative genes for a disease. In this paper we improve t...

متن کامل

SFLA Based Gene Selection Approach for Improving Cancer Classification Accuracy

 In this paper, we propose a new gene selection algorithm based on Shuffled Frog Leaping Algorithm that is called SFLA-FS. The proposed algorithm is used for improving cancer classification accuracy. Most of the biological datasets such as cancer datasets have a large number of genes and few samples. However, most of these genes are not usable in some tasks for example in cancer classification....

متن کامل

Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine

We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003